Character extraction from documents using wavelet maxima
Identifieur interne : 002141 ( Main/Exploration ); précédent : 002140; suivant : 002142Character extraction from documents using wavelet maxima
Auteurs : Wen L. Hwang [République populaire de Chine, Taïwan] ; Fu Chang [République populaire de Chine]Source :
- Image and Vision Computing [ 0262-8856 ] ; 1997.
Descripteurs français
- Pascal (Inist)
English descriptors
- KwdEn :
Abstract
The extraction of character images is an important front-end processing task in optical character recognition (OCR) and other applications. This process is extremely important because OCR applications usually extract salient features and process them. The existence of noise not only destroys features of characters, but also introduces unwanted features. We propose a new algorithm which removes unwanted background noise from a textual image. Our algorithm is based on the observation that the magnitude of the intensity variation of character boundaries differs from that of noise at various scales of their wavelet transform. Therefore, most of the edges corresponding to the character boundaries at each scale can be extracted using a thresholding method. The internal region of a character is determined by a voting procedure, which uses the arguments of the remaining edges. The interior of the recovered characters is solid, containing no holes. The recovered characters tend to become fattened because of the smoothness applied in the calculation of the wavelet transform. To obtain a quality restoration of the character image, the precise locations of characters in the original image are then estimated using a Bayesian criterion. We also present some experimental results that suggest the effectiveness of our method.
Url:
DOI: 10.1016/S0262-8856(97)00063-2
Affiliations:
Links toward previous steps (curation, corpus...)
- to stream Istex, to step Corpus: 000739
- to stream Istex, to step Curation: 000731
- to stream Istex, to step Checkpoint: 001645
- to stream Main, to step Merge: 002258
- to stream PascalFrancis, to step Corpus: 000889
- to stream PascalFrancis, to step Curation: 000B08
- to stream PascalFrancis, to step Checkpoint: 000850
- to stream Main, to step Merge: 002444
- to stream Main, to step Curation: 002141
Le document en format XML
<record><TEI wicri:istexFullTextTei="biblStruct"><teiHeader><fileDesc><titleStmt><title>Character extraction from documents using wavelet maxima</title>
<author><name sortKey="Hwang, Wen L" sort="Hwang, Wen L" uniqKey="Hwang W" first="Wen L." last="Hwang">Wen L. Hwang</name>
</author>
<author><name sortKey="Chang, Fu" sort="Chang, Fu" uniqKey="Chang F" first="Fu" last="Chang">Fu Chang</name>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">ISTEX</idno>
<idno type="RBID">ISTEX:169DF10BF44F302ED8A5331A523BADAD1C8F10F9</idno>
<date when="1998" year="1998">1998</date>
<idno type="doi">10.1016/S0262-8856(97)00063-2</idno>
<idno type="url">https://api.istex.fr/document/169DF10BF44F302ED8A5331A523BADAD1C8F10F9/fulltext/pdf</idno>
<idno type="wicri:Area/Istex/Corpus">000739</idno>
<idno type="wicri:Area/Istex/Curation">000731</idno>
<idno type="wicri:Area/Istex/Checkpoint">001645</idno>
<idno type="wicri:doubleKey">0262-8856:1998:Hwang W:character:extraction:from</idno>
<idno type="wicri:Area/Main/Merge">002258</idno>
<idno type="wicri:source">INIST</idno>
<idno type="RBID">Pascal:98-0263688</idno>
<idno type="wicri:Area/PascalFrancis/Corpus">000889</idno>
<idno type="wicri:Area/PascalFrancis/Curation">000B08</idno>
<idno type="wicri:Area/PascalFrancis/Checkpoint">000850</idno>
<idno type="wicri:doubleKey">0262-8856:1998:Hwang W:character:extraction:from</idno>
<idno type="wicri:Area/Main/Merge">002444</idno>
<idno type="wicri:Area/Main/Curation">002141</idno>
<idno type="wicri:Area/Main/Exploration">002141</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title level="a">Character extraction from documents using wavelet maxima</title>
<author><name sortKey="Hwang, Wen L" sort="Hwang, Wen L" uniqKey="Hwang W" first="Wen L." last="Hwang">Wen L. Hwang</name>
<affiliation wicri:level="1"><country xml:lang="fr">République populaire de Chine</country>
<wicri:regionArea>Institute of Information Science, Academia Sinica, Taiwan</wicri:regionArea>
<wicri:noRegion>Taiwan</wicri:noRegion>
</affiliation>
<affiliation wicri:level="1"><country wicri:rule="url">Taïwan</country>
</affiliation>
</author>
<author><name sortKey="Chang, Fu" sort="Chang, Fu" uniqKey="Chang F" first="Fu" last="Chang">Fu Chang</name>
<affiliation wicri:level="1"><country xml:lang="fr">République populaire de Chine</country>
<wicri:regionArea>Institute of Information Science, Academia Sinica, Taiwan</wicri:regionArea>
<wicri:noRegion>Taiwan</wicri:noRegion>
</affiliation>
</author>
</analytic>
<monogr></monogr>
<series><title level="j">Image and Vision Computing</title>
<title level="j" type="abbrev">IMAVIS</title>
<idno type="ISSN">0262-8856</idno>
<imprint><publisher>ELSEVIER</publisher>
<date type="published" when="1997">1997</date>
<biblScope unit="volume">16</biblScope>
<biblScope unit="issue">5</biblScope>
<biblScope unit="page" from="307">307</biblScope>
<biblScope unit="page" to="315">315</biblScope>
</imprint>
<idno type="ISSN">0262-8856</idno>
</series>
<idno type="istex">169DF10BF44F302ED8A5331A523BADAD1C8F10F9</idno>
<idno type="DOI">10.1016/S0262-8856(97)00063-2</idno>
<idno type="PII">S0262-8856(97)00063-2</idno>
</biblStruct>
</sourceDesc>
<seriesStmt><idno type="ISSN">0262-8856</idno>
</seriesStmt>
</fileDesc>
<profileDesc><textClass><keywords scheme="KwdEn" xml:lang="en"><term>Algorithm</term>
<term>Algorithm performance</term>
<term>Character processing</term>
<term>Edge detection</term>
<term>Image processing</term>
<term>Optical character recognition</term>
<term>Pattern extraction</term>
<term>Pattern recognition</term>
<term>Smoothing</term>
<term>Threshold</term>
</keywords>
<keywords scheme="Pascal" xml:lang="fr"><term>Algorithme</term>
<term>Détection contour</term>
<term>Extraction forme</term>
<term>Lissage</term>
<term>Performance algorithme</term>
<term>Reconnaissance forme</term>
<term>Reconnaissance optique caractère</term>
<term>Seuil</term>
<term>Traitement caractère</term>
<term>Traitement image</term>
</keywords>
</textClass>
<langUsage><language ident="en">en</language>
</langUsage>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en">The extraction of character images is an important front-end processing task in optical character recognition (OCR) and other applications. This process is extremely important because OCR applications usually extract salient features and process them. The existence of noise not only destroys features of characters, but also introduces unwanted features. We propose a new algorithm which removes unwanted background noise from a textual image. Our algorithm is based on the observation that the magnitude of the intensity variation of character boundaries differs from that of noise at various scales of their wavelet transform. Therefore, most of the edges corresponding to the character boundaries at each scale can be extracted using a thresholding method. The internal region of a character is determined by a voting procedure, which uses the arguments of the remaining edges. The interior of the recovered characters is solid, containing no holes. The recovered characters tend to become fattened because of the smoothness applied in the calculation of the wavelet transform. To obtain a quality restoration of the character image, the precise locations of characters in the original image are then estimated using a Bayesian criterion. We also present some experimental results that suggest the effectiveness of our method.</div>
</front>
</TEI>
<affiliations><list><country><li>République populaire de Chine</li>
<li>Taïwan</li>
</country>
</list>
<tree><country name="République populaire de Chine"><noRegion><name sortKey="Hwang, Wen L" sort="Hwang, Wen L" uniqKey="Hwang W" first="Wen L." last="Hwang">Wen L. Hwang</name>
</noRegion>
<name sortKey="Chang, Fu" sort="Chang, Fu" uniqKey="Chang F" first="Fu" last="Chang">Fu Chang</name>
</country>
<country name="Taïwan"><noRegion><name sortKey="Hwang, Wen L" sort="Hwang, Wen L" uniqKey="Hwang W" first="Wen L." last="Hwang">Wen L. Hwang</name>
</noRegion>
</country>
</tree>
</affiliations>
</record>
Pour manipuler ce document sous Unix (Dilib)
EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 002141 | SxmlIndent | more
Ou
HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 002141 | SxmlIndent | more
Pour mettre un lien sur cette page dans le réseau Wicri
{{Explor lien |wiki= Ticri/CIDE |area= OcrV1 |flux= Main |étape= Exploration |type= RBID |clé= ISTEX:169DF10BF44F302ED8A5331A523BADAD1C8F10F9 |texte= Character extraction from documents using wavelet maxima }}
This area was generated with Dilib version V0.6.32. |